Method and apparatus for performing integer operations in response to a result of a floating point operation

ABSTRACT

A method and apparatus for performing a move mask operation. The present invention provides a method and apparatus for performing operations on packed data values of a first size and format and conversion of the results to data of a second size and format by eliminating redundant data. The present invention is useful, for example, when comparisons are performed on floating point data that is typically larger (e.g., 64 bits) than integer data (e.g., 32 bits) and integer operations are preformed based on the result. Because many processors branch based on integer data, the comparison results stored as floating point data must be transferred to an integer register prior to branching. The present invention takes advantage of redundancy of the floating point comparison results to transfer enough data to convey the comparison result to integer registers with a single instruction.

FIELD OF THE INVENTION

[0001] The present invention relates to computer systems. Morespecifically, the present invention relates to performing integeroperations based on results of floating point operations.

BACKGROUND OF THE INVENTION

[0002] Prior art processors typically perform comparisons of data,including integer data, floating point data and packed data. Suchcomparison operations are often used when determining whether branchingshould occur. For example, in a branch if greater than operation, twonumbers are compared and a branch is taken if the first number isgreater than the second number. Otherwise, the branch is not taken. Themost basic comparisons are of two integer numbers.

[0003] In some applications, such as three-dimensional graphics, manynumbers are compared to determine the “location” of various objects withrespect to each other. In such applications, comparisons are performedmore efficiently by operating on packed data. Packed data generallyrefers to the representation of multiple values by a single number. Forexample, four eight-bit integer numbers may be represented by a single32-bit number having four eight-bit segments equivalent to the foureight-bit numbers. Thus, the significance given to various bitplacements is altered from standard 32-bit values in order to accuratelyrepresent a greater number of smaller values. By performing a compare onthe 32-bit packed data, four eight-bit integer compares are accomplishedwith a single compare operation. Similarly, packed data comparisons maybe performed on floating point data.

[0004] Because many prior art processors branch on integer operationsand many applications operate on floating point data, what is needed isan improved method and apparatus for performing branch instructionsbased on integer instructions in response to results of floating pointoperations.

SUMMARY OF THE INVENTION

[0005] A method and apparatus for performing a move mask operation isdescribed. An operation is performed on floating point data and data isextracted from a result of the operation. The data includes a set of oneor more bits where each bit represents multiple redundant bits in theresult of the floating point operation. The set of one or more bits istransferred to an integer register and an operation is performed inresponse to the set of one or more bits.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The present invention is illustrated by way of example, and notby way of limitation, in the figures of the accompanying drawings inwhich like reference numerals refer to similar elements.

[0007]FIG. 1 is one embodiment of a computer system.

[0008]FIG. 2 is one embodiment of an architectural block diagram of aregister set and arithmetic circuitry.

[0009]FIG. 3 is one embodiment of a packed data format.

[0010]FIG. 4 is one embodiment of the result of a compare operationperformed on two packed data values.

[0011]FIG. 5 is one example of compare, move and branch operations.

[0012]FIG. 6 is one embodiment of a flow diagram for a move maskoperation.

DETAILED DESCRIPTION

[0013] A method and apparatus for performing a move mask operation isdescribed. In the following description, for purposes of explanation,numerous specific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however, toone skilled in the art that the present invention may be practicedwithout these specific details. In other instances, well-knownstructures and devices are shown in block diagram form in order to avoidobscuring the present invention.

[0014] The present invention provides a method and apparatus forperforming operations on packed data values of a first size andconversion of the results stored in the first size to data of a secondsize by eliminating redundant data. The present invention is useful, forexample, when operations are performed on floating point data that istypically larger (e.g., 64 bits) than integer data (e.g., 32 bits) andinteger operations are performed based on the floating point result.Because many processors branch based on integer data, the comparisonresults stored as floating point data must be transferred to an integerregister prior to branching. The present invention takes advantage ofredundancy of the floating point comparison results to transfer enoughdata to convey the comparison result to integer registers with a singleinstruction.

[0015]FIG. 1 is one embodiment of a computer system. Computer system 100comprises bus 101 or other device for communicating information, andprocessor 102 coupled with bus 101 for processing information. Processor102 may be a complex instruction set computer (CISC) processor, areduced instruction set computer (RISC) computer, a very longinstruction word (VLIW) processor, or any other type of processor. Inone embodiment, processor 102 is a processor in the Pentium® family ofprocessors available from Intel Corporation of Santa Clara, Calif. Ofcourse, other processors may also be used. In one embodiment, processor102 includes one or more register sets for storing integer and/orfloating point values.

[0016] Computer system 100 further comprises random access memory (RAM)or other dynamic storage device 104 (referred to as main memory),coupled to bus 101 for storing information and instructions to beexecuted by processor 102. Main memory 104 also may be used for storingtemporary variables or other intermediate information during executionof instructions by processor 102. Computer system 100 also comprisesread only memory (ROM) and/or other static storage device 106 coupled tobus 101 for storing static information and instructions for processor102. Data storage device 107 is coupled to bus 101 for storinginformation and instructions.

[0017] Data storage device 107 such as magnetic disk or optical disc andcorresponding drive can be coupled to computer system 100. Computersystem 100 can also be coupled via bus 101 to display device 121, suchas a cathode ray tube (CRT) or liquid crystal display (LCD), fordisplaying information to a computer user. Alphanumeric input device122, including alphanumeric and other keys, is typically coupled to bus101 for communicating information and command selections to processor102. Another type of user input device is cursor control 123, such as amouse, a trackball, or cursor direction keys for communicating directioninformation and command selections to processor 102 and for controllingcursor movement on display 121.

[0018] In one embodiment, computer system 100 provides graphicsfunctionality. Main memory 104 stores sequences of instructions togenerate and display graphical or visual displays on display device 121.Processor 102 executes the sequences of instructions to cause displaydevice 121 to display the resulting graphical or video image. Thesequences of instructions may respond to user input provided viaalphanumeric input device 122, cursor control device 123, or some otherinput device (not shown in FIG. 1). Of course, other systems may alsoprovide graphics functionality or may use the present invention forpurposes other than graphics, such as numerical analysis or othermathematical applicaitons.

[0019]FIG. 2 is one embodiment of an architectural block diagram of aregister set and arithmetic circuitry. The components of FIG. 2 may bepart of processor 102 of FIG. 1, or may be included in other circuitryof computer system 100, either shown or not shown in FIG. 1.

[0020] The present invention is described in terms of floating pointregisters and integer registers. It is important to note that anyregister architecture may be used with the present invention. Somearchitectures, for example, provide a predetermined number of integerregisters and a predetermined number of floating point registers.Alternatively, an architecture may provide a pool of registers fromwhich registers may be used for either integer or floating point use,such as in a processor that uses a register renaming scheme.

[0021] It is also important to note that what is called a register maybe multiple registers treated as a single register. For example, aprocessor may provide multiple 64-bit registers that may be used asinteger registers. Within the same architecture, two 64-bit registersmay store the upper 64 bits and the lower 64 bits of a floating pointnumber and be treated as a single 128-bit floating point register.Alternative architectures may also be used.

[0022] In general, the components of FIG. 2 provide floating pointcomputation and integer computation functionality. Floating pointregisters 200 store floating point data to be used in operationsperformed by floating point arithmetic circuitry 205.

[0023] Integer registers 210 store integer data in registers for use inoperations performed by integer arithmetic circuitry 215. Integerregisters 210 are coupled to floating point registers 200 by transfercircuitry 230. Transfer circuitry 230 may be any circuitry thattransfers data from floating point registers in floating point format tointeger registers stored in integer format.

[0024]FIG. 3 is one embodiment of a packed data format. The packed dataformat of FIG. 2 stores four 32-bit numbers (X₀, X₁, X₂, and X₃) as a128-bit packed data value 300. In such an embodiment, bits 0-31represent X₀, bits 32-63 represent X₁, bits 64-95 represent X₂, and bit96-127 represent X₃. In one embodiment, the packed data are stored infloating point registers.

[0025] Packed data operations are performed on two 128-bit packed datavalues in the format of FIG. 2 with each of the 32-bit values beingoperated on with the corresponding 32-bit value of the corresponding128-bit packed data value. For example, to AND two packed data values,bits 0-31 of the two packed data values are ANDed together to result ina 32-bit result value. The other three 32-bit values may be ANDed inparallel to perform four 32-bit AND operations in a single 128-bitoperation. Of course, other operations may be performed on packed data,such as additions, subtractions, etc.

[0026]FIG. 4 is one embodiment of the result of a compare operationperformed on two packed data values in the format described above withrespect to FIG. 2. In the example of FIG. 4, 128-bit packed data value400 is compared to 128-bit packed data value 410. The result is 128-bitpacked data value 420.

[0027] To perform comparison operation on two 128-bit packed datavalues, each of the four components of the packed data value arecompared to each other. Packed data value 400 comprises four valueslabeled X₀, X₁, X₂, and X₃ and packed data value 410 comprises fourvalues labeled Y₀, Y₁, Y₂, and Y₃. Each value in the respective packeddata values is compared to a corresponding value in the other packeddata value (e.g., X₃ and Y₃).

[0028] Packed data value 420 (Z₀, Z₁, Z₂, and Z₃) stores the result ofthe compare operation. Each value in packed data value 420 stores theresult of the compare operation of the corresponding X and Y values. Inone embodiment, each value (e.g., Z₁, Z₁, Z₂, and Z₃) of packed datavalue 420 stores either 32 set bits, if the corresponding X value isgreater than the Y value, or 32 cleared bits, if the corresponding Yvalue is greater or equal than the X value. Thus, the result datarepresented by packed data value stores redundant information. Theresult information could be stored in four bits, one bit for each of thefour 32-bit values stored in the 128-bit result packed data value 420.

[0029] In one embodiment, the present invention extracts the mostsignificant bit (MSB), or sign bit from each result value (e.g., Z₀, Z₁,Z₂, and Z₃) stored in packed data value 420 when the result of acomparison is transferred to integer registers. Of course, a bit otherthan the most significant bit could be extracted to convey similarinformation. In one embodiment, the low four bits or an integer registerrepresent the result of the packed data compare operation.

[0030] The example of FIGS. 4 and 5 are described in terms of a compareoperation. It is important to note, however, that the floating pointoperation that provides a result may be any other floating pointoperation, whether packed or not.

[0031]FIG. 5 is one example of compare, move and branch operations. Inthe example of FIG. 5, two floating point numbers stored in floatingpoint registers are compared. The result is stored in a third floatingpoint register. Selected bits from the result register are transferredto an integer register. The data in the integer register is then used toevaluate a branch condition or perform an integer operation.

[0032] The example of FIG. 5 may be useful, for example, when evaluatingthree-dimensional graphics. Many values may be compared to determinewhether two objects overlap, touch, etc. In the following example, fourvalues are compared to four other values as part of a packed datacompare operation. Of course, other formats of packed data as well asother floating point operations may also be used. The values stored infloating point registers 200 are described in hexadecimal format, whilethe values stored in integer registers 210 are described in binaryformat.

[0033] In the following example, packed data value in floating pointregister 500 is compared to packed data value in floating point register510. The result is stored as a packed data value in floating pointregister 520. For example, X₃=FF00 and Y₃=F300. Thus, X₃ is greater thanY₃. The result (Z₃=FFFF) is stored in packed data value 520. Othervalues are compared in a similar manner such that the result from eachof the four comparisons is stored in register 520. In one embodiment,floating point comparisons are performed by floating point arithmeticcircuitry 205.

[0034] In one embodiment, the most significant bits from each of theresult values (e.g., Z₀, Z₁, Z₂, and Z₃) are extracted and transferred,via transfer circuitry 230, to integer register 530. Thus, the binaryvalue 1100 represents the result of the floating point comparisonoperation and can be used for integer operations such as branching. Inthe example of FIG. 5, the binary result value 1100 is compared to aconditional binary value 1011 stored in integer register 540. If thecondition is true a branch is taken. Otherwise, the branch is not taken.In one embodiment, integer operations are performed by integerarithmetic circuitry 215.

[0035] Performing floating point comparisons in the manner describedabove is advantageous because the result of the floating point compareis maintained in floating point format and may be used subsequently as amask for later operations. For example, a logical AND operation my beperformed on result packed data value stored in floating point register520 and the packed data value stored in floating point register 500 togenerate a packed data value with the values that are greater than thevalues of the packed data value stored in floating point register 510(e.g., X₃, X₂, 0, 0).

[0036] The value stored in floating point register 520 may be logicallycomplemented and then logically ANDed with the value stored in floatingpoint register 510 to generate a packed data value with the values thatare greater than the greater values stored in floating point register500 (e.g., 0, 0, Y₁, Y₀). The two result values may be logically ORed togenerate a packed data value having the values of the respective valuesstored in floating point registers 500 and 510 (e.g., X₃, X₂, Y₁, Y₀)Another advantage of the present invention is that branches based onfloating point comparisons in processors that support integer branchingmay be performed more efficiently than would otherwise be possible. Forexample, assuming that the comparison of floating point values,extraction of bits, and transfer of bits to an integer register isperformed by a single instruction (e.g., MOVEMASK), the followinginstruction sequence may be used to perform a branch based on a floatingpoint comparison: Z = MOVEMASK (X, Y) // compare fp values X and Y,result is int value Z COMPARE (Z, V) // compare int values Z and V JUMPGREATER THAN // jump if Z > V

[0037] Thus, the present invention provides a more compact instructionstream, and therefore more efficient code, when multiple comparisons offloating point values are used to determine a branching condition.

[0038] The present has been described with respect to compare and branchinstructions. However, extraction of bits and transfer to integerregisters may be performed with any floating point number. For example,the present invention may be used to extract sign bits from each valueof a packed floating point number. The results may be used for integeroperations such as branching or comparisons. Thus, the present inventionhas a broader application than to only floating point comparisons andinteger branches.

[0039]FIG. 6 is one embodiment of a flow diagram for performing a movemask instruction. The process of FIG. 6 is performed on floating pointvalues. In one embodiment, the floating point values are packed floatingpoint values. Alternatively, the floating point values are not packeddata values.

[0040] In step 610, a floating point operation is performed on thefloating point values. The floating point operation may be, for example,a packed floating point compare, a packed floating point add, a floatingpoint multiply, etc.

[0041] In step 620, one or more bits are extracted from a floating pointresult register. In one embodiment, the most significant bit of eachvalue of a packed floating point value is extracted. Alternatively, adifferent bit, such as the least significant bit may be extracted.Extracting the most significant bit provides the advantage that the mostsignificant bit provides the sign of the floating point number. Ofcourse, bits from non-packed data may also be extracted.

[0042] The extracted bits are placed in a predetermined format in step630. In one embodiment, the extracted bits are stored in the leastsignificant bits of the integer format. For example, the bitrepresenting Z₀ (shown in FIG. 4) is stored in the least significant bitof the integer format. The bit representing Z₁ (shown in FIG. 4) isstored in the next to least significant bit of the integer format, andso on. Of course, alternative integer formats may be used. For example,the extracted bits may be stored in the most significant bits of theinteger format.

[0043] In step 640, an integer operation is performed based on theextracted bits stored in an integer register. For example, a branch onequal may be performed in response to bits extracted from a floatingpoint operation. Of course, other operations, such as integer compare,integer add, etc. may also be performed on the extracted bits.

[0044] Thus, the present invention provides a method and apparatus forperforming integer operations based on floating point values withoutlosing the floating point value. This leaves the floating point valuefor later floating point operations, should subsequent operations beperformed. The present invention thereby provides more compact code bytransferring information to integer registers for integer operations andby maintaining floating point values for possible subsequent floatingpoint operations.

[0045] In the foregoing specification, the present invention has beendescribed with reference to specific embodiments thereof. It will,however, be evident that various modifications and changes may be madethereto without departing from the broader spirit and scope of theinvention. The specification and drawings are, accordingly, to beregarded in an illustrative rather than a restrictive sense.

What is claimed is:
 1. In a computer system, method comprising thecomputer-implemented steps of: performing an operation on data stored ina first format; extracting data from a result of the operation stored inthe first format, wherein the data includes a set of one or more bitseach bit in the set of one or more bits represents multiple redundantbits in the result; transferring the set of one or more bits to a secondformat; and performing an operation in response to the set of one ormore bits.
 2. The method of claim 1 , wherein the first format is packedfloating point data.
 3. The method of claim 1 , where in the secondformat is integer data.
 4. The method of claim 1 , wherein the step ofperforming an operation on packed floating point data comprisesperforming a comparison of two sets of packed floating point data. 5.The method of claim 1 , wherein the step of extracting data comprisessetting a bit in a result mask register equal to a corresponding mostsignificant bit of each associated packed floating point data value. 6.The method of claim 1 , wherein the step of transferring comprisestransferring the set of one or more bits from a floating point registerto an integer register.
 7. The method of claim 1 , wherein the step ofperforming an operation in response to the set of one or more bitscomprises performing a branch operation in response to the set of one ormore bits.
 8. A circuit comprising: a first set of registers that storedata in a first format; a first arithmetic unit coupled to the first setof registers, wherein the first arithmetic unit performs compareoperations on data stored in the first set of registers and stores aresult in a register in the first set of registers, and further whereinthe first arithmetic unit extracts a set of from the result, where eachbit in the set of bits represents a redundant set of bits stored in theresult; a second set of registers storing data in a second format; atransfer circuit coupled between the first set of registers and thesecond set of registers, the transfer circuit transferring the set ofbits to a register in the first set of registers; and a secondarithmetic unit that performs operations in response to set of bitsstored in the second set of registers.
 9. The circuit of claim 8 ,wherein the set of registers are floating point registers.
 10. Thecircuit of claim 8 , wherein the second set of registers are integerregisters.
 11. An apparatus comprising: means for performing anoperation on data stored in a first format; means for extracting datafrom a result of the operation stored in the first format, wherein thedata includes a set of one or more bits each bit in the set of one ormore bits represents multiple redundant bits in the result; means fortransferring the set of one or more bits to a second format; and meansfor performing an operation in response to the set of one or more bits.12. The apparatus of claim 11 , wherein the means for performing anoperation on packed floating point data comprises means for performing acomparison of two sets of packed floating point data.
 13. The apparatusof claim 11 , wherein the means for extracting data comprises setting abit in a result mask register equal to a corresponding most significantbit of each associated packed floating point data value.
 14. Theapparatus of claim 11 , wherein the means for transferring comprisesmeans for transferring the set of one or more bits from a floating pointregister to an integer register.
 15. The apparatus of claim 11 , whereinthe means for performing an operation in response to the set of one ormore bits comprises means for performing a branch operation in responseto the set of one or more bits.
 16. A graphics display systemcomprising: a bus; a display device coupled to the bus; and a processorcoupled to the display device, the processor having a plurality ofregisters that store floating point data and integer data, the processorfurther comprising circuitry that extracts one or more bits of data fromone of the registers that stores floating point data and transfers theextracted bits to an integer register to perform an integer operation,wherein the processor causes the display device to change what isdisplayed in response to the integer operation.
 17. The graphics displaysystem of claim 16 , wherein the floating point data represents aportion of what is displayed by the display device.
 18. The graphicsdisplay system of claim 16 , wherein the floating point data comprisespacked floating point data.
 19. The graphics display system of claim 18, wherein the processor extracts a most significant bit from each valuerepresented by the packed floating point data.
 20. The graphics displaysystem of claim 16 wherein the integer operation is a branch operation.